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Abstract 

Motivated, in part, by the desire to develop an information-theoretic foundation for 
compound Poisson approximation limit theorems (analogous to the corresponding devel- 
opments for the central limit theorem and for simple Poisson approximation), this work 
examines sufRcient conditions under which the compound Poisson distribution has max- 
imal entropy within a natural class of probability measures on the nonnegative integers. 
We show that the natural analog of the Poisson maximum entropy property remains valid 
if the measures under consideration are log-concave, but that it fails in general. A parallel 
maximum entropy result is established for the family of compound binomial measures. The 
proofs are largely based on ideas related to the semigroup approach introduced in recent 
work by Johnson [T^] for the Poisson family. SufRcient conditions are given for compound 
distributions to be log-concave, and specific examples are presented illustrating all the 
above results. 
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1 Introduction 



A particularly appealing way to state the classical central limit theorem is to say that, if 
Xi,X2, ■ ■ ■ are independent and identically distributed, continuous random variables with zero 
mean and unit variance, then the entropy of their normalized partial sums Sn = Y17=i -^i 
increases with n to the entropy of the standard normal distribution, which is maximal among 
all random variables with zero mean and unit variance. More precisely, if /„ denotes the 
density of Sn and (/) the standard normal density, then, as n — > oo, 

^(/n) T h{(j)) = sup{/i(/) : densities / with mean and variance 1}, (1) 

where h{f) = — j f log / denotes the differential entropy and log denotes the natural logarithm. 
Precise conditions under which ([T|) holds are given in [l][25][20]; also see [19][1][TT] and the 
references therein, where numerous related results are stated, along with their history. 

Part of the appeal of this formalization of the central limit theorem comes from its analogy 
to the second law of thermodynamics: The "state" (meaning the distribution) of the random 
variables Sn evolves monotonically, until the maximum entropy state, the standard normal dis- 
tribution, is reached. Moreover, the introduction of information-theoretic ideas and techniques 
in connection with the entropy has motivated numerous related results (and their proofs), gen- 
eralizing and strengthening the central limit theorem in different directions; see the references 
mentioned above for details. 

The classical Poisson convergence limit theorems, of which the binomial-to-Poisson is the pro- 
totypical example, have also been examined under a similar light. An analogous program has 
been recently carried out in this case |23j|14j [9]|18j|12j. The starting point is the identification 
of the Poisson distribution as that which has maximal entropy within a natural class of prob- 
ability measures. Perhaps the simplest way to state and prove this is along the following lines; 
first we make some simple definitions: 

Definition 1.1 For any parameter vector p = {pi,P2, ■ ■ ■ ,Pn) with each pi E [0, 1], the sum of 
independent Bernoulli random variables Bi ~ Bern(pj), 

n 

Sn — ^ ^ Bi, 

1=1 

is called a Bernoulli sum, and its probability mass function is denoted by bp{x) := Pi{Sn = x}, 
for X = 0, 1, . . .. Further, for each A > 0, we define the following sets of parameter vectors: 

Vn{\) = {pG [0,1]" : Pi+P2 + ---+Pn = A} and Poo(A) = J P„(A). 

n>l 



Shepp and Olkin [23] showed that, for fixed n > 1, the Bernoulli sum 6p which has maximal 
entropy among all Bernoulli sums with mean A, is Bin(n,A/n), the binomial with parameters 
n and A/n, 

//(Bin(n, A/n)) = max{/7(6p) : p E P„(A)}, (2) 
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where H{P) = — log P(x) denotes the discrete entropy function. Noting that the 

binomial Bin(n, A/n) converges to the Poisson distribution Po(A) as n — > cxo, and that the 
classes of Bernoulli sums in ([2]) are nested, {6p : p G "PnlA)} C {6p : p G P„+i(A)}, Harremoes 
[9] noticed that a simple limiting argument gives the following maximum entropy property for 
the Poisson distribution: 

H{Vo{\)) = sup {if(6p) : p G Poo(A)}. (3) 

Partly motivated by the desire to provide an information-theoretic foundation for compound 
Poisson limit theorems and the more general problem of compound Poisson approximation^ as 
a first step we consider the problem of generalizing the maximum entropy properties ([2]) and 
^ to the case of compound Poisson distributions on We begin with some definitions: 

Definition 1.2 Let P he an arbitrary distribution on = {0,1,...}, and Q a distribution 
on N = {1, 2, . . .}. The Q-compound distribution CqP is the distribution of the random sum, 

E^., (4) 

i=i 

where Y has distribution P and the random variables {Xj} are independent and identically 
distributed (i.i.d.) with common distribution Q and independent of Y . The distribution Q is 
called a compounding distribution, and the map P i— > CqP is the Q-compounding operation. 
The Q-compound distribution CqP can be explicitly written as the mixture, 

oo 

CQP{x) = Y,Piy)Q*Hx), x>o, (5) 

y=0 

where Q*^{x) is the jth convolution power of Q and Q*^ is the point mass at x = 0. 

Above and throughout the paper, the empty sum Yl^=ii' ■ ■ ) is taken to be zero; all random 
variables considered are supported on = {0, 1, . . .}; and all compounding distributions Q 
are supported on N = {1, 2, . . .}. 

Example 1.3 Let Q be an arbitrary distribution on N. 

1. For any < p < 1, the compound Bernoulli distribution CBern(p, Q) is the distribution 
of the product BX, where B ^ Bern{p) and X ^ Q are independent. It has probability 
mass function CqP, where P is the Bern (p) mass function, so that, CqP{0) = l—p and 
CqP{x)=pQ{x) forx> 1. 

^Recall that the compound Poisson distributions are the only infinitely divisible distributions on Z+, and also 
they are (discrete) stable laws 24 . In the way of motivation we also recall Gnedenko and Korolev's remark that 
"there should be mathematical ... probabilistic models of the universal principle of non-decrease of uncertainty," 
and their proposal that we should "find conditions under which certain limit laws appearing in limit theorems of 
probability theory possess extremal entropy properties. Immediate candidates to be subjected to such analysis 
are, of course, stable laws . . ."; see |S1 pp. 211-215]. 
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2. A compound Bernoulli sum is a sum of independent compound Bernoulli random vari- 
ables, all with respect to the same compounding distribution Q: Let Xi, X2, ■ ■ ■ , Xn be 
i.i.d. with common distribution Q and Bi, B2, ■ ■ ■ , Bn be independent Bern(pi). We call, 

^^BiXi = Xj, 

i=i j=i 

a compound Bernoulli sum; in view of^, its distribution is Cqbp, where p = {pi,P2, ■ ■ ■ ,Pn)- 

3. In the special case of a compound Bernoulli sum with all its parameters Pi = p for a fixed 
p G [0, 1], we say that it has a compound binomial distribution, denoted by CBin(n,p, Q). 

4. Let Ux{x) = e-^X^'/xl, x>0, denote the Po(A) mass function. Then, for any A > 0, the 
compound Poisson distribution CPo(A, Q) is the distribution with mass function CqII\: 

j=0 j=0 ^' 

In view of the Shepp-Olkin maximum entropy property ^ for the binomial distribution, a 
first natural conjecture might be that the compound binomial has maximum entropy among 
all compound Bernoulli sums Cgbp with a fixed mean; that is, 

i/(CBin(n,A/n,Q)) =max{i7(CQ6p) : p G T'nlA)}. (7) 

But, perhaps somewhat surprisingly, as Chi [6j has noted, ([7|) fails in general. For example, 
taking Q to be the uniform distribution on {1,2}, p = (0.00125,0.00875) and X = pi + P2 = 
0.01, direct computation shows that, 

i7(CBin(2, A/2, Q)) < 0.090798 < 0.090804 < H^Cgbp). (8) 

As the Shepp-Olkin result ^ was only seen as an intermediate step in proving the maximum 
entropy property of the Poisson distribution ([3]), we may still hope that the corresponding 
result remains true for compound Poisson measures, namely that, 

H {CPo{X, Q))= sup [h (Cgbp) : pGPoo(A)}. (9) 

Again, ([9]) fails in general. For example, taking the same Q,X and p as above, yields, 

H{CPo{X,Q)) < 0.090765 < 0.090804 < H{CQbp). 

The main purpose of the present work is to show that, despite these negative results, it is 
possible to provide natural, broad sufficient conditions, under which the compound binomial 
and compound Poisson distributions can be shown to have maximal entropy in an appropriate 
class of measures. Our first result, Theorem II .41 below, states that ^ does hold, under certain 
conditions on Q and CBin(n, X,Q): 
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Theorem 1.4 If the distribution Q on'N and the compound binomial distribution CBin(n, X/n,Q) 
are both log-concave, then, 

H{CBm{n,X/n,Q))= max ^HiCgbp) : p G P„(A)}, 

as long as the tail of Q satisfies either one of the following properties: (a) Q has finite support; 
or (b) Q has tails heavy enough so that, for some p,(3 > and Nq > 1, we have, Q{x) > , 
for all X > Nq. 

The proof of the theorem is given in Section [3j As can be seen there, conditions (a) and (b) 
are introduced purely for technical reasons, and can probably be significantly relaxed. The 
notion of log-concavity, on the other hand, is central in the development of the ideas in this 
work. [In a different setting, log-concavity also appears as a natural condition for a different 
maximum entropy problem considered by Cover and Zhang [7j.] Recall that the distribution 
P of a random variable X on Z_|_ is log-concave if its support is a (possibly infinite) interval of 
successive integers in Z_|_, and, 

P{x)'^ >P{x + l)P{x-l), forallx>l. (10) 

We also recall that most of the commonly used distributions appearing in applications (e.g., 
the Poisson, binomial, geometric, negative binomial, hypergeometric logarithmic series, or 
Polya-Eggenberger distribution) are log-concave. 

Another key property is that of ultra log-concavity; cf. [22] • The distribution P of a random 
variable X is ultra log-concave if P{x)/Ilx{x) is log-concave, that is, if, 

xP{xf >{x + l)P{x + l)P{x-l), foranx>l. (11) 

Note that the Poisson distribution as well as all Bernoulli sums are ultra log-concave. 

Johnson [12] recently proved the following maximum entropy property for the Poisson distri- 
bution, generalizing 

H (Po{X)) = max ^H(P) : ultra log-concave P with mean a|. (12) 

Our next result (proved in Section [2]) states that, as long as Q and the compound Poisson 
measure CPo(A, Q) are log-concave, the same maximum entropy statement as in (jl2p remains 
valid in the compound Poisson case: 

Theorem 1.5 If the distribution Q onN and and the compound Poisson distribution CPo(A, Q) 
are both log-concave, then, 

H(CFo{X,Q)) = max ^H{CqP) : ultra log-concave P with mean a|. 

In Section d] we give conditions under which the compound Poisson and compound Bernoulli 
distributions are log-concave. In particular, the results there imply the following explicit 
maximum entropy statements. 
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Example 1.6 1. Let Q be an arbitrary log-concave distribution on N. Then Lemma \4-l\ 
combined with Theorem \1.4\ implies that the maximum entropy property of the compound 
binomial distribution in equation ([7|) holds, for all A large enough. That is, the com- 
pound binomial CBin(n, A/n, Q) has maximal entropy among all compound Bernoulli 
sums Cqbp with pi -\- P2 -\ \-pn = X, as long as X> q(i"?|q(2) • 

2. Suppose Q is supported on {1, 2}, with probabilities Q{1) = q, Q{2) = 1 — q, and consider 
the class of all Bernoulli sums bp with mean pi +p2 + ■ ■ ■ +Pn = A. Theorem \4-2\ combined 
with Theorem \1.5\ implies that the compound Poisson maximum entropy property ([9]) holds 
in this case, as long as A is large enough. In other words, the distribution CPo(A, Q) has 
maximal entropy among all compound Bernoulli sums Cqbp with pi + P2 + ■ ■ ■ + Pn = 
A>^. 

3. Suppose Q is geometric with parameter a G (0, 1), i.e., Q{x) = a(l — a)^~^ for all x > 1, 
and again consider the class of a Bernoulli sums bp with mean A. Then Theorem \4-4\ 
combined with Theorem \1.5\ implies that ([9]) holds for all large A; The compound Poisson 
distribution CPo(A, Q) has maximal entropy among all compound Bernoulli sums Cqbp 
with pi + P2 H h = A > ^^^""^ ■ 



Clearly, it remains an open question to give necessary and sufficient conditions on A and Q 
for the compound Poisson and compound binomial distributions to have maximal entropy 
within an appropriately defined class, or even for the compound Poisson distribution to be log- 
concave. Section 4 ends with a conjecture, together with some supporting evidence, stating 
that CPo(A, Q) is log-concave when Q is log-concave and XQ{1)^ > 2(5(2). 
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2 Maximum Entropy Property of the Compound Poisson Dis- 
tribution 



Here we show that, if Q and the compound Poisson distribution CPo(A, Q) = Cqlix are both 
log-concave, then CPo(A, Q) has maximum entropy among all distributions of the form CqP, 
when P has mean A and is ultra log-concave. Our approach is an extension of the 'semigroup' 
arguments of |12| . 

We begin by recording some basic properties of log-concave and ultra log-concave distributions: 

(i) If P is ultra log-concave, then from the definitions it is immediate that P is log-concave. 

(ii) If Q is log-concave, then it has finite moments of all orders; see [16| Theorem 7]. 

(Hi) If X is a random variable with ultra log-concave distribution P, then (by (i) and (ii)) 
it has finite moments of all orders. Moreover, considering the covariance between the 
decreasing function P{x + l)(x + 1) / P{x) and the increasing function x{x — 1) • • • (x — n), 
shows that the falling factorial moments of P satisfy, 

E[{X)n] := E[XiX - 1) . . . (X - n + 1)] < 

see [12] and [10] for details. 

(iv) The Poisson distribution and all Bernoulli sums are ultra log-concave. 

Recall the following definition from [T^ : 

Definition 2.1 Given a G [0, 1] and a random variable X ^ P on Z+ with mean A > 0, let 
UaP denote the distribution of the random variable, 

X 

1=1 

where the Bi are i.i.d. Bern (a), Zx(i-a) has distribution Po(A(l — a)), and all random variables 
are independent of each other and of X . 



Note that, if X ~ P has mean A, then UaP has the same mean. Also, recall the following 
useful relation that was established in Proposition 3.6 of [12J: For all y >0, 

^UaPiv) = - {X{UaP{y) - UaP{y - 1) - {{y + l)UaP{y + 1) - yUaP{y))) . (13) 
oa a 

Next we define another transformation of probability distributions P on Z_|_: 
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Definition 2.2 Given a € [0,1], a distribution P on and a compounding distribution Q 
on N, let UaP denote the distribution CqllaP: 

oo 

U^Pix) := CqU^P{x) = J2 UaP{y)Q*Hx), x>0. 

y=0 

An important observation tliat will be at the heart of the proof of Theorem 11.51 below is 
that, for a = 0, UqP is simply the compound Poisson measure CP{X,Q), while for a = 1, 
Ui P = CqP. The following lemma, proved in the appendix, gives a rough bound on the third 
moment of UaP'- 

Lemma 2.3 Suppose P is an ultra log-concave distribution with mean A > on Z+, and 
let Q be a log-concave compounding distribution on N. For each a G [0,1], let Wa,Va be 
random variables with distributions UaP = CqUaP and CqiUaP)'^ , respectively, where, for 
any distribution R with mean v , we write B!^{y) = R{y-\-l){y-\-l)/u for its size-biased version. 
Then the third moments E{W^) and E{V^) are both bounded above by, 

Xqs + 3X^qiq2 + X^qf, 

where qi,q2,q3 denote the first, second and third moments of Q, respectively. 



In [12], the characterization of the Poisson as a maximum entropy distribution was proved 
through the decrease of its score function. In an analogous way, following we define the 
score function of a Q-compound random variable as follows. 

Definition 2.4 Given a distribution P on Z+ with mean X, the corresponding Q-compound 
distribution CqP has score function defined by: 

Notice that the mean of of ri^CqP with respect to CqP is zero, and that if P ~ Po(A) then 
fi,CQp{x) = 0. Further, when Q is the point mass at 1 this score function reduces to the "scaled 
score function" introduced in [18] . But, unlike the scaled score function and the alternative 
score function r2,CQP given in |3j, this score function is not only a function of the compound 
distribution CqP, but also explicitly depends on P. A projection identity and other properties 
of rifi^p are proved in [H]. 

Next we show that, if Q is log-concave and P is ultra log-concave, then the score function 
fi,CQp{x) is decreasing in x. 

Lemma 2.5 If P is ultra log-concave and the compounding distribution Q is log-concave, then 
the score function ri^CgPix) of CqP is decreasing in x. 
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Proof First we recall Theorem 2.1 of Keilson and Sumita [i7\, which implies that, if Q is 
log-concave, then for any m > n, and for any x: 

Q*™(x + - Q*'"(a;)Q*"(2; + 1) > 0. (15) 

[This can be proved by considering Q*™ as the convolution of Q*^ and Q*^™""-*, and writing 

Q*"(x + i)Q™(x) - + 1) 

= ^ (^Q*"(x + 1 - - Q*"(x - /)Q*"(x + 1) j . 

Since Q is log-concave, then so is Q*"", cf. [IS], so the ratio + 1)/Q*'^{x) is decreasing in 

X, and ([TS]) follows.] 



By definition, ri^c'jjp(x) > ri^CqPi^ + 1) if ^-nd only if, 

< {^{y + l)P{y + l)Q*y{x)^{^P{z)Q*^{x + l)^ 

- (y^^v + i)P{v + l)Q*'{x + 1)^ P{z)Q*%x)^ 

= Y(y + ^)P(y + 1)^(^) [Q*"(^)Q*'(x + 1) - Q*H^ + Wi^)] ■ (16) 



Noting that for y = z the term in square brackets in the double sum becomes zero, and 
swapping the values of y and z in the range y > z, the double sum in becomes, 

Y,[{y + ^)P{y + 1)^(^) - (^ + 1)^(^ + i)^(y)] [Q*n^)Q*'(^ + 1) - Q*'{^ + • 

By the ultra log-concavity of P, the first square bracket is positive for y < z, and by equa- 
tion (lisp the second square bracket is also positive for y < z. D 



We remark that, under the same assumptions, and using a very similar argument, an analogous 
result holds for the score function r2^CQP recently introduced in [3]. 

Combining Lemmas 12.51 and 12.31 with equation we deduce the following result, which is 
the main technical step in the proof of Theorem 11.51 below . 



Proposition 2.6 Let P be an ultra log-concave distribution on TL^ with mean A > 0, and as- 
sume that Q and CPo(A, Q) are both log-concave. Let Wa be a random variable with distribution 
UaP, and define, for all a G [0, 1], the function, 

E{a) := E[-\ogCQllx{Wo,)]. 

Then E{a) is continuous for all a € [0, 1], it is differentiable for a € (0, 1), and, moreover, 
E'{a) <Oforae (0, 1). In particular, E{0) > E{1). 
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Proof Recall that, 

oo X 

U^P{x) = CqU^P{x) = UaP{y)Q*Hx) = UaP(.y)Q*Hx), 

y=0 y=0 

where the last sum is restricted to the range < y < x, because Q is supported on N. Therefore, 
since UaP{x) is continuous in a [12], so is UaP{x), and to show that E(a) is continuous it 
suffices to show that the series, 

oo 

E{a) :=£[- log CqUx{W^)] = - ^ log CqHaIx), (17) 

x=0 

converges uniformly. To that end, first observe that log-concavity of CqHx implies that Q{1) is 
nonzero. [Otherwise, if i > 1 be the smallest integer i such that Q{i) ^ 0, then Cqllxii+l) = 0, 
but CQllx{i) and CQllx{2i) are both strictly positive, contradicting the log-concavity of CqIIa.] 
Since Q{1) is nonzero, we can bound the compound Poisson probabilities as, 

1 > CqUx{x) = ^[e'^AVy!]Q*?'(x) > e-^[A^7x!]Q(l)^, for ah x > 1, 
y 

so that the summands in p7|) can be bounded, 

< -logCQnA(x) < A + logx! -2;log(AQ(l)) < Cx^ x > 1, (18) 

for a constant C > that depends only on A and Q(l). Therefore, for any > 1, the tail of 
the series pT|) can be bounded, 

oo „ 

0<-Yl uSP{x)logCQUxix) < CEiW^I^w^^N}] < j;^E[W^], 

x=N 

and, in view of Lemma 12.31 it converges uniformly. 

Therefore, E{a) is continuous in a, and, in particular, convergent for all a € [0,1]. To prove 
that it is differentiable at each a € (0, 1) we need to establish that: (i) the summands in 
()17p are continuously differentiable in a for each x; and (ii) the series of derivatives converges 
uniformly. 

Since, as noted above, Ua P{x) is defined by a finite sum, we can differentiate with respect to 
a under the sum, to obtain, 

§^u2p{x) = ^CqU^p{x) = j2 ^u^P{y)Q*nx)- (19) 

And since UaP is continuously differentiable in a G (0, 1) for each x (cf. \12\ Proposition 3.6] or 
equation (fT3j) above), so are the summands in (fT7|l . establishing (i); in fact, they are infinitely 
differentiable, which can be seen by repeated applications of (jl3p . To show that the series of 
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derivatives converges uniformly, let a be restricted in an arbitrary open interval (e, 1) for some 
e > 0. The relation (|13p combined with (jl9p yields, for any x, 

= UiUo^Piy) - UaPiy - 1) - {{y + l)Uo.P{y + l) - yUa,P{y))) Q*^(x) 
= -- E ((y + Wc.P{y + 1) - XU^P{y)) - Q*^+i(x)) 

y=0 

= -- E ((y + WaPiy + 1) - AC/„P(y)) g*^(x) 

y=0 

+ E E + + 1) - >^u^Piy)) Q*\x - v) 

^ — ^ a. ^ — ^ 



+^Y,Q{v)uSP{x-v) 



t)=0 



XUSPix - v) , 



(20) 



v=0 



Also, for any x, by definition, 

|[/«P(x)r^^^Q^(x)| < Cq{U^P)*{x) + U^Pix), 

where, for any distribution P, we write P#{y) = P{y + l){y + I) / \ for itS size-biased version. 
Hence for any > 1, equations (pn|) and ((TS|) yield the bound, 

Y;^—USP{x)logCQn,{a 



x=N 



< 



E -^{CQ{TJo.Pfix) + V^Pix) + E Q(v){Cq{V^P)*{x -v)^ uSp{x - v)]} 



x=N 

2C 



v=0 



< 



< 



a 

c 



>N, Wa>N, X>N} 



{E[V^]+E[Wj^]+E[X^]], 



Na 



where C,C' > are appropriate finite constants, and the random variables Va ~ CQ{UaP)'^ , 
Wa ~ Ua P and X ^ Q are independent. Lemma 12.31 implies that this bound converges to 
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zero uniformly in a € (fijl), as oo. Since e > was arbitrary, this establishes that 

E{a) is differentiable for all a £ (0, 1) and, in fact, that we can differentiate the series (|17p 
term-by-term, to obtain, 



da 

x=0 



, oo 

A 



x=0 \ v=0 

oo 




^ E C/«P(x)r^^^Q^(x) ( log CqUx{x) - Qiv) log CqUx{x + ^ 



x=0 \ v=0 / 

where the second equality follows from using (j20p above, and the rearrangement leading to the 
third equality follows by interchanging the order of (second) double summation and replacing 
X by x + f . 

Now we note that, exactly as in [T2], the last series above is the covariance between the (zero- 
mean) function ^Qp(x) and the function {\agCQ^\(x) — ^^Q(v)\ogCQ^\{x^v))^ under 

the measure uSp. Since P is ultra log-concave, so is UaP [E]) hence the score function 
r-^^Qp{x) is decreasing in x, by Lemma f2.5l Also, the log-concavity of CqHx implies that 
the second function is increasing, and Chebyshev's rearrangement lemma implies that the 
covariance is less than or equal to zero, proving that E'{a) < 0, as claimed. 

Finally, the fact that E{0) > E(l) is an immediate consequence of the continuity of E{a) on 
[0, 1] and the fact that E'{a) < for ah a G (0, 1). □ 

Notice that, for the above proof to work, it is not necessary that CqII\ be log-concave; the 
weaker property that (log CqIIx{x) — Q{v) log CQli\{x + v)) be increasing is enough. 

Proof of Theorem [Us] As in Proposition [Ml let Wo, ~ U^P = CqUaP, and let D{P\\Q) 
denote the relative entropy between P and Q, 

x>0 ' 

Then, noting that Wq ~ CqIIa and Wi ~ CqP, we have, 

H{CqP) < HiCQP)+D{CQP\\CQUx) 
= -E[\ogCQUxiWi)] 
< -E[\ogCQUx{Wo)] 
= H{CqUx), 

where the first inequality is simply the nonnegativity of relative entropy, and the second in- 
equality is exactly the statement that E{1) < E{0), proved in Proposition 12.61 D 
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3 Maximum Entropy Property of the Compound Binomial Dis- 
tribution 



Here we prove the maximum entropy result for compound binomial random variables, The- 
orem [T31 The proof, to some extent, parallels some of the arguments in which 
rely on differentiating the compound-sum probabilities bp(x) for a given parameter vector 
P = {p1tP2, ■ ■ ■ ,Pn) (recall Definition 11.11 in the Introduction), with respect to an individual 
Pi. Using the representation, 

n 

CQbp{y) = J2WQ*''iy)' y^O' (22) 

x=0 

differentiating CQbp{x) reduces to differentiating bp{x), and leads to an expression equivalent 
to that derived earlier in ()20p for the derivative of CqUaP with respect to a. 

Lemma 3.1 Given a parameter vector p = (pi,P2) ■ ■ ■ ,Pn), with n > 2 and each < Pi < 1, 
let, 

f P1+P2 , . P1+P2 , 
Pt = I ^ ht, t,P3,...,Pn 

for t G [-{pi +P2)/2, {pi+P2)/2]. Then, 



d_ 

y=o 



n 

CQhp,{x) = (-2t) ^6p(y) (q*(^+2)(x) - 2Q<y+^){x) + Q*y{x)) , (23) 



where p = (ps, . . . ,p„). 

Proof Note that the sum of the entries of pi is constant as t varies, and that Pt = P for 
t = {pi -p2)/2, while Pt = {{pi +P2)/2, {pi +?'2)/2,P3, ■■■,Pn) for t = 0. Writing k = pi+P2, 
bp^. can be expressed. 



bptiy) 



^ - t'^ b~p{y -2) + {k{l-^^+ 2t^) bp{y - 1) 



and its derivative with respect to t is 

d 



dt 



bpM = -2t {bp{y - 2) - 2bp{y - 1) + bp{y)) . 
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The expression ()22p for Cqbp^ shows that it is a finite hnear combination of compound-sum 
probabihties bp^^{x), so we can differentiate inside the sum to obtain, 



d_ 
dt 



y=0 



n 



= -2t ^ (6p(y - 2) - 25p(y - 1) + 6p(y)) Q*y{x) 



y=0 



since bp{y) = for y < — 1 and y > n — 1. 



□ 



Next we state and prove the equivalent of Proposition 12.61 above: 

Proposition 3.2 Suppose that the distribution Q on N and the compound binomial distribu- 
tion CBin(n, A/n, Q) ^'"e both log-concave; let p = {pi,P2, ■ ■ ■ ,Pn) be a given parameter vector 
with n > 2, pi + p2 + ■ ■ ■ + Pn = ^ > 0, and pi > p2; let Wt be a random variable with 
distribution Cgbp^.; and define, for all t € [0, {pi — p2)/2], the function, 



where p denotes the parameter vector with all entries equal to X/n. If Q satisfies either of the 
conditions: (a) Q finite support; or (b) Q has tails heavy enough so that, for some /3 > and 
^0 ^ 1) we have, Q{x) > p^^ , for all x > Nq, then E{t) is continuous for all t € [0, {pi —p2)/2], 
it is differentiable for t G (0, {pi —p2)/2), and, moreover, E'{t) < for t £ (0, {pi — p2)/2). In 
particular, E{0) > E{{pi —p2)/2). 

Proof The compound distribution Cqbp^ is defined by the finite sum, 



and is, therefore, continuous in t. First, assume that Q has finite support. Then so does 
Cqbp for any parameter vector p, and the continuity and differentiabihty of E[t) are trivial. 
In particular, the series defining E{t) is a finite sum, so we can differentiate term-by-term, to 



E{t) := E[-\ogCQbp{Wt)] 



n 



CQbp,{x) = Y,KMQ*'i^) 
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obtain, 

oo 

dt 



x=0 



oo 71—2 



2*EE^p(y) {Q*^'^^\^) - 2Q<y+^\x) + Q*y{x)) \ogCQh^{x) (24) 



x=Q y=0 
n— 2 oo 



y=0 2=0 i",!" 



log CQbp{z + v + w) - log Cqbpiz + 



log Cq6p(z + + log Cq6p(z) 



V 



(25) 



where (I24p follows by Lemma l3.ll By assumption, the distribution Cqbp = CBin(n, A/n, Q) 
is log-concave, which implies that, for all z,v,w such that z + v + w is in the support of 
CBin(n, A/n, Q), 

Cq%(^) ^ Cqbpiz + u-) 



Cqbpiz + v) Cqbpiz + v + w)' 
Hence the term in square brackets in equation ()25p is negative, and the result follows. 



Now, suppose condition (b) holds on the tails of Q. First we note that the moments of Wt are 
all uniformly bounded in t: Indeed, for any 7 > 0, 

00 00 n n 00 

= ECQ&Pt(^)^^ = EE^t(y)Q*n^)^^ < EE^*'(^)^'' ^ Crrq,, (26) 
x=0 x=0 y=Q y=0 x=0 

where C„ is a constant depending only on n, and is the 7th moment of Q, which is of course 
finite; recall property (ii) in the beginning of Section [2l 

For the continuity of E{t), it suffices to show that the series, 

00 

Eit) := E[-logCqb^{Wt)] = -^Cqbp,{x)logCqb^ix), (27) 

x=0 

converges uniformly. The tail assumption on Q implies that, for all x > Nq, 

n 

1 > Cqb^ix) = ^6p(y)Q*^(x) > A(l - A/n)"-ig(x) > A(l - A/n)"-V"', 
y=o 

so that, 

< - log Cqbp{x)<Cx^, (28) 



for an appropriate constant C > 0. Then, for N > Nq, the tail of the series (j27p can be 
bounded, 

< - E Cqbp,ix)logCqbpix) < CE[wfl{w^^^y] < ^E[wf+'] < ^C^qp+i, 
x=N 
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where the last inequality follows from ()26p . This obviously converges to zero, uniformly in t, 
therefore E{t) is continuous. 

For the differentiability of E(t), note that the summands in (I17p are continuously differentiable 
(by Lemma l3.ip . and that the series of derivatives converges uniformly in t; to see that, for 
> A^o we apply Lemma l3 . 1 1 together with the bound (I28p to get, 

°° d 

x=N 

oo n 

^ 2t ^ ^ 6p(2/) (Q<y+^\x) + 2Q<y+^\x) + Q*y{x)) Cx^ 

x=N y=0 

n oo 

< 2CtY,Y. {Q*^^^'^\x) + 2Q<y+^\x)+Q*y{x)^ x^, 

j/=0 x=N 

which is again easily seen to converge to zero uniformly in t as — > oo, since Q has finite 
moments of all orders. This establishes the differentiability of E[t) and justifies the term-by- 
term differentiation of the series (I17p : the rest of the proof that E'{t) < is the same as in 
case (a). □ 



Note that, as with Proposition 12.61 the above proof only requires that the compound binomial 
distribution CBin(n, A/n, Q) = Cqhp satisfies a property weaker than log-concavity, namely 
that the function, log Cgbpix) — Q{v) log CQbp{x + v), be increasing in x. 

Proof of Theorem 11.41 Assume, without loss of generality, that n > 2. If pi > p2, then 
Proposition 13.21 savs that, E{{pi —p2)/2) < E{0), that is, 

oo oo 

- Yl '^Q^p(^) Cqbpix) < - ^ Cqbp* (x) log CQbp{x), 

x=0 x=0 

where p* = {{pi + p2)/2, (pi +p2)/2,P3, . . -Pn) and p = (A/n, . . . , A/n). Since the expression 
Yl'^=o^Q^pti^)^'^sCQbp{x) is invariant under permutations of the elements of the parameter 
vectors, we deduce that it is maximized by pt = p. Therefore, using, as before, the nonnega- 
tivity of the relative entropy, 

H{CQbp) < H{CQbp)+D{CQbJCQb^) 

oo 

x=0 

oo 

< -J2CQbp{^)logCQbp{x) 

x=0 

= HiCgbp) = H{CBm{n,X/n,Q)), 
as claimed. D 
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4 Conditions for Log-Concavity 



Theorems 11.51 and 11.41 state that log-concavity is a sufficient condition for compound binomial 
and compound Poisson distributions to have maximal entropy within a natural class. Here we 
give examples of when log-concavity holds; if the results in this section can be strengthened (in 
particular, if Conjecture 14.51 can be proved), then the class of maximum entropy distributions 
will be accordingly widened. 

Below we show that a compound Bernoulli sum is log-concave if the parameters are sufficiently 
large, and that compound Bernoulli sums and compound Poisson distributions are log-concave 
if Q is either supported only on the set {1, 2} or is geometric. 

Lemma 4.1 Suppose Q is a log-concave distribution on N. 

(i) The compound Bernoulli distribution CBern (p, Q) is log-concave if and only ifp > i^q(i)2/q(2) ' 

(ii) The compound Bernoulli sum distribution Cqbp is log-concave as along as all the elements 
Pi of the parameter vector p = {pi,p2, . . . ,Pn) satisfy pi > i+Q(i)a /q(2) • 

Proof Let Y have distribution CBern {p,Q). Since Q is log-concave itself, the log-concavity 
of CBern (p, Q) is equivalent to the inequality, Pr(y = 1)^ > Pr(y = 2) Pr(y = 0), which 
states that, {pQ{l))'^ > (1 — p)pQ{2), and this is exactly the assumption of (i). 

The assertion in (ii) follows from (i) , since the sum of independent log-concave random variables 
is log-concave; see, e.g., [15]. D 

Next we examine conditions under which a compound Poisson measure is log-concave. Our 
argument is based, in part, on the some of the ideas in Johnson and Goldschmidt [13], and 
also in Wang and Yeh [26], where transformations that preserve log-concavity are studied. 

Note that, unlike for the Poisson distribution, it is not the case that every compound Poisson 
distribution CPo(A, Q) is log-concave. Indeed, for any distribution P, considering the differ- 
ence, CqP{1)^ — CqP{0)CqP{2), shows that a necessary condition for CqP to be log-concave 
is that, 

(P(l)2 - P(0)P(2))/P(0)P(1) > Q(2)/Q(l)2. (29) 

Taking P to be the Po(A) distribution, a necessary condition for CPo(A, Q) to be log-concave 
is that, 

2Q(2 ) 

while for P = bp, a necessary condition for the compound Bernoulli sum Cqbp to be log-concave 
is, 

^ l-Vi \^ (1-v 



A > (30) 



Pi 



which, by Jensen's inequality, will hold as long as, J2iPi ^ 2(5(2)/(5(l)^. 
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Theorem 4.2 Let Q be a distribution supported on the set {1,2}. 

(i) The compound Poisson distribution CPo(A,(5) is log-concave for all A > ^ip- 

(ii) The distribution CqP is log-concave for any ultra log-concave distribution P with support on 
{0,1,..., iV} (where N may be infinite), which satisfies, (x + l)P(x + 1)/P(x) >2Q{2)/Q{1)^ 
for all X = 0,1,... ,N. 



Note that, the second condition in (ii) is equivalent to requiring that NP{N)/P(N — 1) > 
2Q(2)/Q(1)2 if N is finite, or that hm^^oo(a; + l)P(x + l)/P{x) > 2Q{2)/Q{lf if N is 
infinite. 

Proof Writing R{y) = y\P{y), we know that CqP{x) = YTy=oR{y) iQ*^ix)/y\) . Hence, the 
log-concavity of CqP{x) is equivalent to showing that, 

V ^ T Riy)^.)^] - + > 0. (31) 

for all X > 2, since the case of x = 1 was dealt with previously by equation ()29|] . In particular, 
for (i), taking P = Po(A), it suffices to show that for all r and x, the function, 



9rAk)-= Yl (j 



r\ Q*y{k)Q*'{2x-k) 



g*'-(2x) 



y+z=r 

is unimodal as a function of k (since gr,x{k) is symmetric about x). 

In the general case (h), writing Q{2) = p = I — (5(1), we have, Q*y{x) = (^^j^)p^~^(l — p)^^~^, 
so that, 

'r\Q*y{k)Q*''{2x-k) f2x-r\f2r-2x^ 



,y J Q*'-(2x) \k-y J \ 2y-k J' ^^^^ 

for any p. Now, following \13\ Lemma 2.4] and [261 Lemma 2.1], we use summation by parts 
to show that the inner sum in (I31|) is positive for each r (except for r = x when x is odd), by 
case-splitting according to the parity of r. 

(a) For r = 2t, we rewrite the inner sum of equation (|31|) as. 



Y{R{t + s)R{t -s)-R{t + s + l)R{t - s - 1)) ; 

2x — r\ (2r — 2x\ / 2x — r \ ( 2r — 2x 



s=0 



E 

\y=t-s 



X — y J \ 2y — x J \x + \ — yj \2y — x — 1 



where the first term in the above product is positive by the ultra log-concavity of P (and hence 
log-concavity of R), and the second term is positive by Lemma SiS] below. 
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(b) Similarly, for x 7^ r = 2t + 1, we rewrite the inner sum of equation ([3T|l as, 
t 

^{R{t + S + l)R{t -s)-R{t + s + 2)R{t - s - 1)) X 



s=0 

/t+l+s 



s-^ / / 2x - r\ /2r - 2x\ / 2x - r \f 2r -2x 
^ ^ ^ X — y J \ 2y — X J \x + 1 — y J \2y — x — I 



\y=t- 

where the first term in the product is positive by the ultra log-concavity of P (and hence 
log-concavity of R) and the second term is positive by Lemma 14.31 below. 

(c) Finally, in the case of x = r = 2t + 1, substituting k = x and k = x + 1 in ([52]) . combining 
the resulting expression with (I3ip . and noting that {'^^~^^) is 1 if and only if n = (and is 
zero, otherwise), we see that the inner sum becomes, —R{t + l)i?(t) (^*^^) , and the summands 
in ([3T]) reduce to, 

p'=R{t)R{t + 1) 
(t + l)!t! ■ 

However, the next term in the outer sum of equation (|3ip . r = x + 1, gives 



2{2t)\ 



2(2t)! ' yy-t J \t + lj J 2(t + l)!t! 

Hence, the sum of the first two terms is positive (and hence the whole sum is positive) if 
R{t + l){l-p)y{2p)>R{t). 

If P is Poisson(A), this simply reduces to equation (|3Up . otherwise we use the fact that R{x + 
l)/i?(x) is decreasing. □ 

Lemma 4.3 (a) If r = 2t, for any < s <t, the sum, 

y4 //2x-r\ /2r-2x\ / 2x - r \/ 2r - 2x \\ 
„±^AU-yy V2y-^y ~ \x + l-y)\2y-x-ll I 



(b) If x ^ r = 2t -\- 1, for any < s < t, the sum, 
t+i+ 



y=t-s 



2x - r\ / 2r - 2x\ / 2x - r \ / 2r - 2x 
X — y J \ 2y — X J \x + \ — yj \2y — x — 1 



> 0. 



Proof The proof is in two stages; first we show that the sum is positive for s = t, then we 
show that there exists some S such that, as s increases, the increments are positive for s < S 
and negative for s > S. The result then follows, as in pKj or [2B] . 
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For both (a) and (b), note that for s = t, equation ()32p imphes that the sum is the difference 
between the coefficients of and T^+i in fr,x{T) = (1 + r2)2^-'-(l + Tfr-2x^ g^j^^g frA^) 
has degree 2x and has coefficients which are symmetric about T^, it is enough to show that 
the coefficients form a unimodal sequence. Now, (1 + r2)2^-^(l + T) has coefficients which do 
form a unimodal sequence. Statement Si of Keilson and Gerber [16] states that any binomial 
distribution is strongly unimodal, which means that it preserves unimodality on convolution. 
This means that (1 + 7"2j2x-r^-[^ _j_ rpy.r-2x ^nJixLodal if r — x > 1, and we need only check 
the case r = x, when fr,x{T) = (1 + T^)*". Note that \i r = 2t is even, the difference between 
the coefficients of and T^+^ is {^^^ , which is positive. 

In part (a), the increments are equal to Q.^St'^^ {2t~^s-x) multiplied by the expression, 

2 {x -t- s){2t-2s - x) {x - t + s){2t + 2s - x) 

~ {x + l-t + s){2t + 2s-x + l) ~ {x + I - t - s){2t - 2s - X + 1)' 

which is positive for s small and negative for s large, since placing the term in brackets over a 
common denominator, the numerator is of the form (a — bs"^). 

Similarly, in part (b), the increments equal (2*-27-ir) times the expression, 

2 {x -t - s - l){2t - 2s - x) {x-t + s){2t + 2 + 2s-x) 
" {x + l-t + s){2t + 2s-x + 3) ~ {x-t-s){2t + l-2s-x)' 

which is again positive for s small and negative for s large. D 

Theorem 4.4 Let Q be a geometric distribution on N. Then CqP is log-concave for any 
distribution P which is log-concave and satisfies the condition ()29p . 

Proof If Q is geometric with mean 1/a, then, Q*y{x) = ay {I - a)'^"^(^:i), which implies 
that, 

X 

CQP{x) = Y,PiyhH^-c^r-'' 

y=0 

Condition ([29]) ensures that CqP{1)^ - CqP{0)CqP{2) > 0, so, taking z = y - 1, we need 
only prove that the sequence, 

X 

C{x) := CqP{x + 1)/(1 -aT = Y, + 1) 

z=0 

is log-concave. However, this follows immediately from \15\ Theorem 7.3], which proves that 
if {oj} is a log-concave sequence, then so is {bi}, defined by bi = X]}=o '— ' 

Finally, based on the discussion in the beginning of this section, the above results, and some 
calculations of the quantities, CqII\{x)'^ — CqII\{x — l)CQ'n.x{x + 1) for small x, we make the 
following conjecture: 





20 



Conjecture 4.5 The compound Poisson measure CPo(A, Q) is log-concave, as long as Q is 
log-concave and XQ{1)'^ > 2(5(2). 



The condition XQ{lf > 2Q(2) is, of course, necessary; recall the argument leading to equa- 
tion ([30]) above. 

In closing, we list some known results that are related to this conjecture and may be useful in 
proving (or disproving) it: 



1. Theorem 2.3 of Steutel and van Harn [23] shows that, if {iQ{i)} is a decreasing sequence, 
then CPo(A, Q) is a unimodal distribution (recall that log-concavity implies unimodality) . 
Interestingly, the same condition provides a dichotomy of results in compound Poisson 
approximation bounds as developed in [2]: If {iQ{i)} is decreasing the bounds are of the 
same form and order as in the simple Poisson case, while if it is not the bounds are much 
larger. 

2. Theorem 3.2 of Cai and Willmot [5] shows that if {Q{i)} is decreasing then the distribu- 
tion function of the compound Poisson distribution CPo(A, Q) is log-concave. 

3. A conjecture similar to Conjecture 14.51 is that, for log-concave Q, if CPo(A, Q) is log- 
concave, then so is CPo(//,(5), for all fi > X. Theorem 4.9 of Keilson and Sumita [TT] 
proves the related result that, if Q is log-concave, then, for any n, the ratio, 

IS decreasmg m A. 



CgUxin + l) 
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Appendix 

Proof of Lemma 12.31 Recall that, as stated in properties (ii) and (in) in the beginning of 
Section [21 Q has finite moments of all orders, and that the nth falling factorial moment of any 
ultra log-concave random variable Y with distribution R on Z_|_ is bounded above by {E(Y))". 
Now for an arbitrary ultra log-concave distribution R, define random variables Y ^ R and 
Z ~ CqR. If ri, r2, r3 denote the first three moments of y ~ ii, then, 

E{Z^) = q3n+Hiq2E[{Y)2]+q!E[{Y)3] 

< q3ri+5qiq2rl+qM. (33) 

Since the map Ua preserves ultra log-concavity [12j, if P is ultra log-concave then so is i? = 
UaP, so that (jSSp gives the required bound for the third moment of Wa, upon noting that the 
mean of the distribution UaP is equal to A. 

Similarly, size-biasing preserves ultra log-concavity; that is, if R is ultra log-concave, then 
so is R*, since R*{x + l){x + l)/R*{x) = {R{x + 2){x + 2){x + l))/{R{x + l){x + 1)) = 
R{x + 2){x + 2) / R{x + 1) is also decreasing. Hence, R' = (UaP)'^ is ultra log-concave, and ([33]) 
applies in this case as well. In particular, noting that the mean of Y' R' = {UaP)"^ = R"^ 
can be bounded in terms of the mean of 1" ~ i? as, 

the bound ([33]) yields the required bound for the third moment oi Va- HH 
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